Studying Properties of Czech Complex Sentences from an Annotated Corpus
نویسندگان
چکیده
The paper deals with the problem of an analysis of complex sentences in Czech on the basis of manually annotated data. The availability of a specialized corpus explicitly describing mutual relationships between segments and clauses in Czech complex sentences, together with the availability of a thoroughly syntactically annotated corpus, the Prague Dependency Treebank, provide a solid background for linguistic investigation. The paper presents quantitative, linguistic and structural observations which provide a number of clues for building an algorithm for analyzing a structure of complex sentences in the future.
منابع مشابه
An Annotated Corpus Outside Its Original Context: A Corpus-Based Exercise Book
We present the STYX system, which is designed as an electronic corpus-based exercise book of Czech morphology and syntax with sentences directly selected from the Prague Dependency Treebank, the largest annotated corpus of the Czech language. The exercise book offers complex sentence processing with respect to both morphological and syntactic phenomena, i. e. the exercises allow students of bas...
متن کاملSegmentation of Complex Sentences
The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units that may be subsequently combined into clauses and thus provide a structure of a complex sentence with regard to the mutual relationship of individual clauses. The method has been developed for Czech as a language representing languages with relatively high degree of wo...
متن کاملPrague Dependency Treebank as an Exercise Book of Czech
There was simply linguistics at the beginning. During the years, linguistics has been accompanied by various attributes. For example corpus one. While a name corpus is relatively young in linguistics, its content related to a language collection of texts and speeches is nothing new at all. Speaking about corpus linguistics nowadays, we keep in mind collecting of language resources in an electro...
متن کاملAspect-Level Sentiment Analysis in Czech
This paper presents a pioneering research on aspect-level sentiment analysis in Czech. The main contribution of the paper is the newly created Czech aspectlevel sentiment corpus, based on data from restaurant reviews. We annotated the corpus with two variants of aspect-level sentiment – aspect terms and aspect categories. The corpus consists of 1,244 sentences and 1,824 annotated aspects and is...
متن کاملCzech Legal Text Treebank 1.0
We introduce a new member of the family of Prague dependency treebanks. The Czech Legal Text Treebank 1.0 is a morphologically and syntactically annotated corpus of 1,128 sentences. The treebank contains texts from the legal domain, namely the documents from the Collection of Laws of the Czech Republic. Legal texts differ from other domains in several language phenomena influenced by rather hig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011